{
"cells": [
{
"cell_type": "markdown",
"id": "9cc3b252-0afc-404f-a7c2-706d0a7e3c89",
"metadata": {
"editable": true,
"id": "9cc3b252-0afc-404f-a7c2-706d0a7e3c89",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"source": [
"### Asking scientific questions of models - Exercises & Answers"
]
},
{
"cell_type": "markdown",
"id": "a734dcce-afbf-409a-8c38-ce4a535f4ea5",
"metadata": {
"id": "a734dcce-afbf-409a-8c38-ce4a535f4ea5"
},
"source": [
"The exercises here are designed to get you comfortable using models to make predictions and having them answer questions of interest, as opposed to relying on a suite of tests picked from a flowchart."
]
},
{
"cell_type": "markdown",
"id": "0cde8377-c11f-4474-95f2-7e6d353ecba1",
"metadata": {
"id": "0cde8377-c11f-4474-95f2-7e6d353ecba1"
},
"source": [
"## Traditional approaches from a model-based perspective\n",
"To get things clear, lets do some standard approaches such as t-tests and ANOVAs from the perspective of a linear model. We won't interpret the coefficients, we'll just get the model to tell us the answer directly and compare it to the traditional answer.\n",
"\n",
"### a. Imports\n",
"Import `pandas`, `pingouin`, `statsmodels.formula.api`, `seaborn`, and also `marginaleffects`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "83f7ae21-b1f1-4d9b-93b2-720c03bbe75b",
"metadata": {
"editable": true,
"executionInfo": {
"elapsed": 11,
"status": "ok",
"timestamp": 1723926881418,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "83f7ae21-b1f1-4d9b-93b2-720c03bbe75b",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# Your answer here\n",
"import pandas as pd\n",
"import pingouin as pg\n",
"import statsmodels.formula.api as smf\n",
"import seaborn as sns\n",
"import marginaleffects as me\n",
"\n",
"sns.set_style('whitegrid')"
]
},
{
"cell_type": "markdown",
"id": "9dc75c2e-25e4-4ed9-8f57-b60a682e038e",
"metadata": {
"id": "9dc75c2e-25e4-4ed9-8f57-b60a682e038e"
},
"source": [
"### b. Loading up data\n",
"We will continue our exploration of the 'Teaching Ratings' dataset here, and use `marginaleffects` to explore the consequences of our models.\n",
"\n",
"The data can be found here: https://vincentarelbundock.github.io/Rdatasets/csv/AER/TeachingRatings.csv\n",
"\n",
"Read it into a dataframe called `profs`, and show the top 5 rows."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a87dbb65-80a5-4a29-a8b6-e26838ab5792",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 224
},
"editable": true,
"executionInfo": {
"elapsed": 10,
"status": "ok",
"timestamp": 1723926881418,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "a87dbb65-80a5-4a29-a8b6-e26838ab5792",
"outputId": "efe2f990-4fea-415e-f736-d6ae07fc0649",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
rownames
\n",
"
minority
\n",
"
age
\n",
"
gender
\n",
"
credits
\n",
"
beauty
\n",
"
eval
\n",
"
division
\n",
"
native
\n",
"
tenure
\n",
"
students
\n",
"
allstudents
\n",
"
prof
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
1
\n",
"
yes
\n",
"
36
\n",
"
female
\n",
"
more
\n",
"
0.289916
\n",
"
4.3
\n",
"
upper
\n",
"
yes
\n",
"
yes
\n",
"
24
\n",
"
43
\n",
"
1
\n",
"
\n",
"
\n",
"
1
\n",
"
2
\n",
"
no
\n",
"
59
\n",
"
male
\n",
"
more
\n",
"
-0.737732
\n",
"
4.5
\n",
"
upper
\n",
"
yes
\n",
"
yes
\n",
"
17
\n",
"
20
\n",
"
2
\n",
"
\n",
"
\n",
"
2
\n",
"
3
\n",
"
no
\n",
"
51
\n",
"
male
\n",
"
more
\n",
"
-0.571984
\n",
"
3.7
\n",
"
upper
\n",
"
yes
\n",
"
yes
\n",
"
55
\n",
"
55
\n",
"
3
\n",
"
\n",
"
\n",
"
3
\n",
"
4
\n",
"
no
\n",
"
40
\n",
"
female
\n",
"
more
\n",
"
-0.677963
\n",
"
4.3
\n",
"
upper
\n",
"
yes
\n",
"
yes
\n",
"
40
\n",
"
46
\n",
"
4
\n",
"
\n",
"
\n",
"
4
\n",
"
5
\n",
"
no
\n",
"
31
\n",
"
female
\n",
"
more
\n",
"
1.509794
\n",
"
4.4
\n",
"
upper
\n",
"
yes
\n",
"
yes
\n",
"
42
\n",
"
48
\n",
"
5
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" rownames minority age gender credits beauty eval division native \\\n",
"0 1 yes 36 female more 0.289916 4.3 upper yes \n",
"1 2 no 59 male more -0.737732 4.5 upper yes \n",
"2 3 no 51 male more -0.571984 3.7 upper yes \n",
"3 4 no 40 female more -0.677963 4.3 upper yes \n",
"4 5 no 31 female more 1.509794 4.4 upper yes \n",
"\n",
" tenure students allstudents prof \n",
"0 yes 24 43 1 \n",
"1 yes 17 20 2 \n",
"2 yes 55 55 3 \n",
"3 yes 40 46 4 \n",
"4 yes 42 48 5 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# Read in dataset\n",
"profs = pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/AER/TeachingRatings.csv')\n",
"profs.head()"
]
},
{
"cell_type": "markdown",
"id": "76eeeca9-97b3-463a-b485-5878d5b0124d",
"metadata": {
"id": "76eeeca9-97b3-463a-b485-5878d5b0124d"
},
"source": [
"### c. The t-test as a marginal effect\n",
"We will recreate a t-test with model-based predictions.\n",
"\n",
"First, conduct a t-test with `pingouin`, comparing the evaluation score of male and female professors."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "80728e27-f1d7-4b6d-8e87-9d9a7676463d",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 114
},
"editable": true,
"executionInfo": {
"elapsed": 8,
"status": "ok",
"timestamp": 1723926881418,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "80728e27-f1d7-4b6d-8e87-9d9a7676463d",
"outputId": "61160ca2-552d-4cbc-86a0-e110873cf56c",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
T
\n",
"
dof
\n",
"
alternative
\n",
"
p-val
\n",
"
CI95%
\n",
"
cohen-d
\n",
"
BF10
\n",
"
power
\n",
"
\n",
" \n",
" \n",
"
\n",
"
T-test
\n",
"
-3.266711
\n",
"
425.755804
\n",
"
two-sided
\n",
"
0.001176
\n",
"
[-0.27, -0.07]
\n",
"
0.305901
\n",
"
17.548
\n",
"
0.900288
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" T dof alternative p-val CI95% cohen-d \\\n",
"T-test -3.266711 425.755804 two-sided 0.001176 [-0.27, -0.07] 0.305901 \n",
"\n",
" BF10 power \n",
"T-test 17.548 0.900288 "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# T-test with pingouin\n",
"pg.ttest(profs.query('gender == \"female\"')['eval'],\n",
" profs.query('gender == \"male\"')['eval']\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "7af5a6c2-aa6b-4748-a1ed-194884866b78",
"metadata": {
"id": "7af5a6c2-aa6b-4748-a1ed-194884866b78"
},
"source": [
"Now fit a regression model with `statsmodels` predicting evaluations from gender. Call the model `ttest`. Check the summary, and remember the coefficient will equal the mean difference, which we can check our predictions against."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "69dadb4a-7a67-4d8a-8735-fad9245542c8",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 253
},
"editable": true,
"executionInfo": {
"elapsed": 7,
"status": "ok",
"timestamp": 1723926881418,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "69dadb4a-7a67-4d8a-8735-fad9245542c8",
"outputId": "dc7f5e9e-ef3a-46b3-dc75-a56ee5b415dc",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
eval
R-squared:
0.022
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.020
\n",
"
\n",
"
\n",
"
No. Observations:
463
F-statistic:
10.56
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
0.00124
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
3.9010
0.039
99.187
0.000
3.824
3.978
\n",
"
\n",
"
\n",
"
gender[T.male]
0.1680
0.052
3.250
0.001
0.066
0.270
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.022 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.020 \\\\\n",
"\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 10.56 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 0.00124 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 3.9010 & 0.039 & 99.187 & 0.000 & 3.824 & 3.978 \\\\\n",
"\\textbf{gender[T.male]} & 0.1680 & 0.052 & 3.250 & 0.001 & 0.066 & 0.270 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: eval R-squared: 0.022\n",
"Model: OLS Adj. R-squared: 0.020\n",
"No. Observations: 463 F-statistic: 10.56\n",
"Covariance Type: nonrobust Prob (F-statistic): 0.00124\n",
"==================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"----------------------------------------------------------------------------------\n",
"Intercept 3.9010 0.039 99.187 0.000 3.824 3.978\n",
"gender[T.male] 0.1680 0.052 3.250 0.001 0.066 0.270\n",
"==================================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# Fit the model\n",
"ttest = smf.ols('eval ~ gender', data=profs).fit()\n",
"ttest.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "309098eb-c5fa-4346-8302-2c8449632495",
"metadata": {
"id": "309098eb-c5fa-4346-8302-2c8449632495"
},
"source": [
"Next, use `marginaleffects` to create a datagrid that will give predictions for female and male professors, and pass it to `me.predictions` to make the predictions. Examine the values."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "e68cdbde-8412-462d-b45e-fcf709251814",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 213
},
"editable": true,
"executionInfo": {
"elapsed": 370,
"status": "ok",
"timestamp": 1723926881781,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "e68cdbde-8412-462d-b45e-fcf709251814",
"outputId": "5c953256-76f9-4720-c955-ef3df126a751",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
"shape: (1, 8)\n",
"┌───────────────┬──────────┬───────────┬──────┬─────────┬──────┬────────┬───────┐\n",
"│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n",
"╞═══════════════╪══════════╪═══════════╪══════╪═════════╪══════╪════════╪═══════╡\n",
"│ Row 1 - Row 2 ┆ 0.168 ┆ 0.0517 ┆ 3.25 ┆ 0.00115 ┆ 9.76 ┆ 0.0667 ┆ 0.269 │\n",
"└───────────────┴──────────┴───────────┴──────┴─────────┴──────┴────────┴───────┘\n",
"\n",
"Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# Comparison is done via hypothesis\n",
"me.predictions(ttest, newdata=datagrid, hypothesis='pairwise')"
]
},
{
"cell_type": "markdown",
"id": "7af377ff-fbb7-4597-88ee-756f80eda804",
"metadata": {
"id": "7af377ff-fbb7-4597-88ee-756f80eda804"
},
"source": [
"### d. Carrying out an ANOVA with linear models and marginal effects\n",
"Lets now demonstrate how an ANOVA can be executed easily with a linear model and the examination of marginal effects.\n",
"\n",
"First, use `pinoguin` to carry out an ANOVA on teaching evaluations, using tenure and gender as the factors - that is, examine whether male and female professors differ in their evaluations depending on whether they have achieved tenure or not."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "ece27b38-a2db-48f8-97c6-e8a636037065",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 173
},
"editable": true,
"executionInfo": {
"elapsed": 8,
"status": "ok",
"timestamp": 1723926881781,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "ece27b38-a2db-48f8-97c6-e8a636037065",
"outputId": "79d963c0-ca43-41c0-d431-4d41fcd2bd8b",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Source
\n",
"
SS
\n",
"
DF
\n",
"
MS
\n",
"
F
\n",
"
p-unc
\n",
"
np2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
gender
\n",
"
3.628914
\n",
"
1.0
\n",
"
3.628914
\n",
"
12.615338
\n",
"
0.000422
\n",
"
0.026749
\n",
"
\n",
"
\n",
"
1
\n",
"
tenure
\n",
"
2.829395
\n",
"
1.0
\n",
"
2.829395
\n",
"
9.835936
\n",
"
0.001821
\n",
"
0.020979
\n",
"
\n",
"
\n",
"
2
\n",
"
gender * tenure
\n",
"
4.187913
\n",
"
1.0
\n",
"
4.187913
\n",
"
14.558608
\n",
"
0.000154
\n",
"
0.030743
\n",
"
\n",
"
\n",
"
3
\n",
"
Residual
\n",
"
132.035435
\n",
"
459.0
\n",
"
0.287659
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Source SS DF MS F p-unc np2\n",
"0 gender 3.628914 1.0 3.628914 12.615338 0.000422 0.026749\n",
"1 tenure 2.829395 1.0 2.829395 9.835936 0.001821 0.020979\n",
"2 gender * tenure 4.187913 1.0 4.187913 14.558608 0.000154 0.030743\n",
"3 Residual 132.035435 459.0 0.287659 NaN NaN NaN"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# A Pingouin ANOVA\n",
"pg.anova(data=profs, dv='eval', between=['gender', 'tenure'])"
]
},
{
"cell_type": "markdown",
"id": "b6b860ac-19c3-4e85-8a55-2613a24b9f2a",
"metadata": {
"id": "b6b860ac-19c3-4e85-8a55-2613a24b9f2a"
},
"source": [
"This suggests there is a main effect of gender, tenure and an interaction. Usually we'd need to do post-hoc tests to explore these. But we can rely on marginal effects for a simpler interpretation. First, fit a linear regression that is the same as the ANOVA, predicting evaluation measures from gender, tenure, and its interaction. Call it an `anova_model`."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "5f87872b-c485-41e2-818b-e52a298b17a2",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 295
},
"editable": true,
"executionInfo": {
"elapsed": 7,
"status": "ok",
"timestamp": 1723926881782,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "5f87872b-c485-41e2-818b-e52a298b17a2",
"outputId": "d6700cb9-298f-468c-b937-42fadb421a43",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
eval
R-squared:
0.072
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.066
\n",
"
\n",
"
\n",
"
No. Observations:
463
F-statistic:
11.82
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
1.80e-07
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
3.8600
0.076
50.890
0.000
3.711
4.009
\n",
"
\n",
"
\n",
"
gender[T.male]
0.5362
0.106
5.047
0.000
0.327
0.745
\n",
"
\n",
"
\n",
"
tenure[T.yes]
0.0552
0.088
0.627
0.531
-0.118
0.228
\n",
"
\n",
"
\n",
"
gender[T.male]:tenure[T.yes]
-0.4610
0.121
-3.816
0.000
-0.699
-0.224
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.072 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.066 \\\\\n",
"\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 11.82 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 1.80e-07 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 3.8600 & 0.076 & 50.890 & 0.000 & 3.711 & 4.009 \\\\\n",
"\\textbf{gender[T.male]} & 0.5362 & 0.106 & 5.047 & 0.000 & 0.327 & 0.745 \\\\\n",
"\\textbf{tenure[T.yes]} & 0.0552 & 0.088 & 0.627 & 0.531 & -0.118 & 0.228 \\\\\n",
"\\textbf{gender[T.male]:tenure[T.yes]} & -0.4610 & 0.121 & -3.816 & 0.000 & -0.699 & -0.224 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: eval R-squared: 0.072\n",
"Model: OLS Adj. R-squared: 0.066\n",
"No. Observations: 463 F-statistic: 11.82\n",
"Covariance Type: nonrobust Prob (F-statistic): 1.80e-07\n",
"================================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------------------------\n",
"Intercept 3.8600 0.076 50.890 0.000 3.711 4.009\n",
"gender[T.male] 0.5362 0.106 5.047 0.000 0.327 0.745\n",
"tenure[T.yes] 0.0552 0.088 0.627 0.531 -0.118 0.228\n",
"gender[T.male]:tenure[T.yes] -0.4610 0.121 -3.816 0.000 -0.699 -0.224\n",
"================================================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# A linear model equivalent\n",
"anova_model = smf.ols('eval ~ gender * tenure', data=profs).fit()\n",
"anova_model.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "81931246-3836-4ad2-acc8-2a8001ca2971",
"metadata": {
"id": "81931246-3836-4ad2-acc8-2a8001ca2971"
},
"source": [
"With a fitted model, we can easily explore the implications via the predictions.\n",
"\n",
"First, make a datagrid that gives predictions for tenure and gender. Call it `anova_predmat`, and then use the model to predict those scores, storing them in a dataframe called `anova_predictions`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "cb771ac4-c3cc-4ac6-87f8-b6c46b38172c",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 309
},
"editable": true,
"executionInfo": {
"elapsed": 7,
"status": "ok",
"timestamp": 1723926881782,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "cb771ac4-c3cc-4ac6-87f8-b6c46b38172c",
"outputId": "b0babe63-dba5-42b7-83f7-8a0d60fb242c",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# Your answer here\n",
"# Prediction grid\n",
"anova_predmat = me.datagrid(anova_model,\n",
" tenure=['yes', 'no'],\n",
" gender=['male', 'female'])\n",
"\n",
"# Output\n",
"anova_predictions = me.predictions(anova_model, newdata=anova_predmat)"
]
},
{
"cell_type": "markdown",
"id": "dfbe945e-3969-4513-8cfe-3e9d9279abf8",
"metadata": {
"id": "dfbe945e-3969-4513-8cfe-3e9d9279abf8"
},
"source": [
"It is always sensible to plot predictions before we begin interpretin them. Use `seaborn` to create a line plot that illustrates the interaction. Any way you want is fine - as long as the estimate is on the y axis."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "71c09c74-134f-4e35-a722-e9232800ce21",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 159
},
"editable": true,
"executionInfo": {
"elapsed": 216,
"status": "ok",
"timestamp": 1723927069461,
"user": {
"displayName": "Alex Jones",
"userId": "11094282981700434339"
},
"user_tz": -60
},
"id": "os0rDvsqDUV-",
"outputId": "c7b8f145-dd1a-4cc4-8119-ec38128e7322",
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Your answer here\n",
"# plot\n",
"sns.lineplot(data=anova_predictions,\n",
" y='estimate', x='tenure',\n",
" hue='gender')"
]
},
{
"cell_type": "markdown",
"id": "058ddc5f-eac2-4bc8-88fe-d3593a287224",
"metadata": {},
"source": [
"The ANOVA suggested we had the following results:\n",
"1. A main effect of gender (differences between men and women, ignoring tenure status)\n",
"2. A main effect of tenure (differences between tenured and non-tenured, ignoring gender)\n",
"3. An interaction, indicating the difference between one variable (e.g. gender) at one level of the other (say tenured) is different to the other (confusing!)\n",
"\n",
"Have the model make predictions, and to explore the main effects, use the `by` keyword to ignore one variable and the `hypothesis` keyword to check the differences."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "748d0f0b-2dec-4771-ab4d-0ad0e44f1e7e",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"shape: (1, 8)
term
estimate
std_error
statistic
p_value
s_value
conf_low
conf_high
str
f64
f64
f64
f64
f64
f64
f64
"Row 1 - Row 2"
0.175352
0.060417
2.902376
0.003703
8.076918
0.056937
0.293766
"
],
"text/plain": [
"shape: (1, 8)\n",
"┌───────────────┬──────────┬───────────┬─────┬─────────┬──────┬────────┬───────┐\n",
"│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n",
"╞═══════════════╪══════════╪═══════════╪═════╪═════════╪══════╪════════╪═══════╡\n",
"│ Row 1 - Row 2 ┆ 0.175 ┆ 0.0604 ┆ 2.9 ┆ 0.0037 ┆ 8.08 ┆ 0.0569 ┆ 0.294 │\n",
"└───────────────┴──────────┴───────────┴─────┴─────────┴──────┴────────┴───────┘\n",
"\n",
"Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# Main effects\n",
"# Gender\n",
"me.predictions(anova_model, newdata=anova_predmat, by='gender', hypothesis='pairwise')\n",
"\n",
"# Tenure\n",
"me.predictions(anova_model, newdata=anova_predmat, by='tenure', hypothesis='pairwise')"
]
},
{
"cell_type": "markdown",
"id": "39c2d04c-5d51-42eb-a11c-12bf50d0c9a6",
"metadata": {},
"source": [
"Now use the predictions to figure out the 'cause' of the interaction. There are a few ways to do this. You can compare men and women professors who are tenured, and see if that difference is significant, and then see if the difference between non-tenured professors is also significant. What do you observe?"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "50197fde-97b4-4ab9-826d-bdc0d4b5d3b2",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"shape: (2, 10)
gender
term
contrast
estimate
std_error
statistic
p_value
s_value
conf_low
conf_high
str
str
str
f64
f64
f64
f64
f64
f64
f64
"female"
"tenure"
"mean(yes) - mean(no)"
0.055172
0.08796
0.627241
0.530501
0.914573
-0.117227
0.227572
"male"
"tenure"
"mean(yes) - mean(no)"
-0.405876
0.082847
-4.899093
9.6280e-7
19.98626
-0.568254
-0.243499
"
],
"text/plain": [
"shape: (2, 10)\n",
"┌────────┬────────┬──────────────────────┬──────────┬───┬──────────┬───────┬────────┬────────┐\n",
"│ gender ┆ Term ┆ Contrast ┆ Estimate ┆ … ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ str ┆ str ┆ ┆ str ┆ str ┆ str ┆ str │\n",
"╞════════╪════════╪══════════════════════╪══════════╪═══╪══════════╪═══════╪════════╪════════╡\n",
"│ female ┆ tenure ┆ mean(yes) - mean(no) ┆ 0.0552 ┆ … ┆ 0.531 ┆ 0.915 ┆ -0.117 ┆ 0.228 │\n",
"│ male ┆ tenure ┆ mean(yes) - mean(no) ┆ -0.406 ┆ … ┆ 9.63e-07 ┆ 20 ┆ -0.568 ┆ -0.243 │\n",
"└────────┴────────┴──────────────────────┴──────────┴───┴──────────┴───────┴────────┴────────┘\n",
"\n",
"Columns: gender, term, contrast, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"me.predictions(anova_model, newdata=anova_predmat, hypothesis='b1=b2') # Tenured, NON significant\n",
"me.predictions(anova_model, newdata=anova_predmat, hypothesis='b3=b4') # Non-tenured, significant, males > females\n",
"\n",
"# Slopes\n",
"me.slopes(anova_model, newdata=anova_predmat, variables='tenure', by='gender')"
]
},
{
"cell_type": "markdown",
"id": "4a9d8c72-61ef-47d9-a4ad-a1439fd6bdce",
"metadata": {},
"source": [
"### e. ANCOVA done with marginal effects\n",
"Let us now add some complexity. ANCOVA is often described as an ANOVA 'adjusting' for another variable. We know it simply as a general linear model, with some kind of categorical predictor, and other continuous predictors that are also in the model. There can be as many categorical predictors and interactions between them as needed, as well as the continuous covariates.\n",
"\n",
"ANCOVA is a confusing and unnecessary term. Linear models are simpler, and here we will see how. \n",
"\n",
"First, carry out an ANCOVA with `pingouin` that looks at teaching evaluations between men and women (the categorical predictor), but adjusts for their beauty (the continuous covariate). Print the result. What does it tell you?\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "dd07d46a-4774-4d96-bc93-0f6348da1299",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Source
\n",
"
SS
\n",
"
DF
\n",
"
F
\n",
"
p-unc
\n",
"
np2
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
gender
\n",
"
4.346745
\n",
"
1
\n",
"
15.055490
\n",
"
0.000120
\n",
"
0.031692
\n",
"
\n",
"
\n",
"
1
\n",
"
beauty
\n",
"
6.243877
\n",
"
1
\n",
"
21.626444
\n",
"
0.000004
\n",
"
0.044903
\n",
"
\n",
"
\n",
"
2
\n",
"
Residual
\n",
"
132.808865
\n",
"
460
\n",
"
NaN
\n",
"
NaN
\n",
"
NaN
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Source SS DF F p-unc np2\n",
"0 gender 4.346745 1 15.055490 0.000120 0.031692\n",
"1 beauty 6.243877 1 21.626444 0.000004 0.044903\n",
"2 Residual 132.808865 460 NaN NaN NaN"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# ANCOVA in pingouin\n",
"pg.ancova(data=profs, dv='eval', between='gender', covar='beauty')"
]
},
{
"cell_type": "markdown",
"id": "f451aa14-65aa-44b0-bce4-b6cb7f7a1f66",
"metadata": {},
"source": [
"You should see that there are significant effects of both gender and beauty, but there's little information of use here. \n",
"\n",
"Fit a linear model that is equivalent to this ANCOVA, called `ancova_mod`. Print the summary."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "2dac5a1f-0aeb-489e-aaf2-d59f67b7c76f",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
eval
R-squared:
0.066
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.062
\n",
"
\n",
"
\n",
"
No. Observations:
463
F-statistic:
16.33
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
1.41e-07
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
3.8838
0.039
100.468
0.000
3.808
3.960
\n",
"
\n",
"
\n",
"
gender[T.male]
0.1978
0.051
3.880
0.000
0.098
0.298
\n",
"
\n",
"
\n",
"
beauty
0.1486
0.032
4.650
0.000
0.086
0.211
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.066 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.062 \\\\\n",
"\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 16.33 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 1.41e-07 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 3.8838 & 0.039 & 100.468 & 0.000 & 3.808 & 3.960 \\\\\n",
"\\textbf{gender[T.male]} & 0.1978 & 0.051 & 3.880 & 0.000 & 0.098 & 0.298 \\\\\n",
"\\textbf{beauty} & 0.1486 & 0.032 & 4.650 & 0.000 & 0.086 & 0.211 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: eval R-squared: 0.066\n",
"Model: OLS Adj. R-squared: 0.062\n",
"No. Observations: 463 F-statistic: 16.33\n",
"Covariance Type: nonrobust Prob (F-statistic): 1.41e-07\n",
"==================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"----------------------------------------------------------------------------------\n",
"Intercept 3.8838 0.039 100.468 0.000 3.808 3.960\n",
"gender[T.male] 0.1978 0.051 3.880 0.000 0.098 0.298\n",
"beauty 0.1486 0.032 4.650 0.000 0.086 0.211\n",
"==================================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"ancova_mod = smf.ols('eval ~ gender + beauty', data=profs).fit()\n",
"ancova_mod.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "42ec728f-3e19-4308-89f9-d1bed94dbcd1",
"metadata": {},
"source": [
"Now make a prediction about teaching evaluations for females and males. As the variable we want to control for is in the model (beauty), we don't need to make any predictions for it."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "e5767d2b-27f9-407c-8596-2de46f959801",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
"shape: (1, 8)\n",
"┌───────────────┬──────────┬───────────┬──────┬─────────┬──────┬────────┬───────┐\n",
"│ Term ┆ Estimate ┆ Std.Error ┆ z ┆ P(>|z|) ┆ S ┆ 2.5% ┆ 97.5% │\n",
"│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │\n",
"│ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str ┆ str │\n",
"╞═══════════════╪══════════╪═══════════╪══════╪═════════╪══════╪════════╪═══════╡\n",
"│ Row 1 - Row 2 ┆ 0.168 ┆ 0.0517 ┆ 3.25 ┆ 0.00115 ┆ 9.76 ┆ 0.0667 ┆ 0.269 │\n",
"└───────────────┴──────────┴───────────┴──────┴─────────┴──────┴────────┴───────┘\n",
"\n",
"Columns: term, estimate, std_error, statistic, p_value, s_value, conf_low, conf_high"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer\n",
"# T-test style model\n",
"ancova_mod = smf.ols('eval ~ gender', data=profs).fit()\n",
"\n",
"# Predictions and contrast\n",
"me.predictions(ancova_mod, \n",
" hypothesis='pairwise',\n",
" newdata=me.datagrid(ancova_mod,\n",
" gender=['male', 'female'])\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "53187ff3-129e-4454-846c-f3684e570af8",
"metadata": {},
"source": [
"### f. Knowledge of linear models gets you out of trouble\n",
"Following on from the last example, lets say we want to examine the interaction between gender and tenure status and control for beauty. Perhaps we wish to see whether our earlier ANOVA model stands up if we incorporate and control for beauty.\n",
"\n",
"First, try to fit one of these models in `pingouin`. Its another ANCOVA, but this time has two between factors."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "6651ee2b-7edf-4569-a2b2-19395540440c",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"# Your answer here\n",
"#pg.ancova(data=profs, dv='eval', between=['tenure', 'gender'], covar='beauty')"
]
},
{
"cell_type": "markdown",
"id": "fcc7e531-bd48-4731-9d7c-ecc44b052536",
"metadata": {},
"source": [
"If you did this correctly, you should see an error - the software doesn't support it!\n",
"\n",
"But we can easily fit a linear model to do this. Fit a model that has an interaction between gender and tenure, and has beauty as a predictor."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "c9400fd1-5bec-4f6a-8f31-81f0b14de217",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
eval
R-squared:
0.104
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.096
\n",
"
\n",
"
\n",
"
No. Observations:
463
F-statistic:
13.23
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
3.32e-10
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
3.8804
0.075
51.883
0.000
3.733
4.027
\n",
"
\n",
"
\n",
"
gender[T.male]
0.4890
0.105
4.650
0.000
0.282
0.696
\n",
"
\n",
"
\n",
"
tenure[T.yes]
0.0076
0.087
0.087
0.930
-0.164
0.179
\n",
"
\n",
"
\n",
"
gender[T.male]:tenure[T.yes]
-0.3668
0.121
-3.027
0.003
-0.605
-0.129
\n",
"
\n",
"
\n",
"
beauty
0.1289
0.032
4.032
0.000
0.066
0.192
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.104 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.096 \\\\\n",
"\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 13.23 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 3.32e-10 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 3.8804 & 0.075 & 51.883 & 0.000 & 3.733 & 4.027 \\\\\n",
"\\textbf{gender[T.male]} & 0.4890 & 0.105 & 4.650 & 0.000 & 0.282 & 0.696 \\\\\n",
"\\textbf{tenure[T.yes]} & 0.0076 & 0.087 & 0.087 & 0.930 & -0.164 & 0.179 \\\\\n",
"\\textbf{gender[T.male]:tenure[T.yes]} & -0.3668 & 0.121 & -3.027 & 0.003 & -0.605 & -0.129 \\\\\n",
"\\textbf{beauty} & 0.1289 & 0.032 & 4.032 & 0.000 & 0.066 & 0.192 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: eval R-squared: 0.104\n",
"Model: OLS Adj. R-squared: 0.096\n",
"No. Observations: 463 F-statistic: 13.23\n",
"Covariance Type: nonrobust Prob (F-statistic): 3.32e-10\n",
"================================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"------------------------------------------------------------------------------------------------\n",
"Intercept 3.8804 0.075 51.883 0.000 3.733 4.027\n",
"gender[T.male] 0.4890 0.105 4.650 0.000 0.282 0.696\n",
"tenure[T.yes] 0.0076 0.087 0.087 0.930 -0.164 0.179\n",
"gender[T.male]:tenure[T.yes] -0.3668 0.121 -3.027 0.003 -0.605 -0.129\n",
"beauty 0.1289 0.032 4.032 0.000 0.066 0.192\n",
"================================================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# More complex ANCOVA\n",
"ancova2 = smf.ols('eval ~ beauty + gender * tenure', data=profs).fit()\n",
"ancova2.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "b201bade-ae5e-48ba-a369-87af55f8fd05",
"metadata": {},
"source": [
"Once you have this model, use it to make predictions about gender and tenure as before, and work out the interaction effects. Are they the same as before?"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "af123534-3d80-42b2-b298-86ca46d7aff9",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Your answer here\n",
"sns.lineplot(data=me.predictions(ancova2, newdata=anova_predmat),\n",
" y='estimate', x='tenure',\n",
" hue='gender')"
]
},
{
"cell_type": "markdown",
"id": "96b774ec-7b88-48b5-abb9-4b92d245367c",
"metadata": {},
"source": [
"### g. Interpreting complex interactions with marginal effects\n",
"If you've completed the above exercises, you've mastered 99% of the statistics used in basic psychology, and learned how to do it from a much clearer perspective. Lets now build knowledge of how to interpret an even more complex model. \n",
"\n",
"Lets suppose that, rather than controlling for beauty's influence on teaching evaluations for tenured and non-tenured males and females, you want to know whether it influences evaluations at these combinations. That is, you might wish to see how less attractive males are evaluated before and after tenure, and whether this change is different for females, who are typically judged more harshly on their looks. \n",
"\n",
"To do this, you will need an interaction between gender, beauty, and tenure. Fit a model that does this, and call it `three_interact`. Print the summary."
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "964e9828-d394-4fba-9919-214f51e87a9b",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
OLS Regression Results
\n",
"
\n",
"
Dep. Variable:
eval
R-squared:
0.110
\n",
"
\n",
"
\n",
"
Model:
OLS
Adj. R-squared:
0.097
\n",
"
\n",
"
\n",
"
No. Observations:
463
F-statistic:
8.055
\n",
"
\n",
"
\n",
"
Covariance Type:
nonrobust
Prob (F-statistic):
3.03e-09
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
coef
std err
t
P>|t|
[0.025
0.975]
\n",
"
\n",
"
\n",
"
Intercept
3.8601
0.076
51.031
0.000
3.711
4.009
\n",
"
\n",
"
\n",
"
gender[T.male]
0.5076
0.107
4.741
0.000
0.297
0.718
\n",
"
\n",
"
\n",
"
tenure[T.yes]
0.0275
0.088
0.312
0.755
-0.146
0.201
\n",
"
\n",
"
\n",
"
gender[T.male]:tenure[T.yes]
-0.3781
0.122
-3.100
0.002
-0.618
-0.138
\n",
"
\n",
"
\n",
"
beauty
0.0006
0.080
0.008
0.994
-0.156
0.157
\n",
"
\n",
"
\n",
"
gender[T.male]:beauty
0.1362
0.124
1.094
0.274
-0.108
0.381
\n",
"
\n",
"
\n",
"
beauty:tenure[T.yes]
0.1301
0.099
1.315
0.189
-0.064
0.325
\n",
"
\n",
"
\n",
"
gender[T.male]:beauty:tenure[T.yes]
-0.0934
0.146
-0.640
0.522
-0.380
0.193
\n",
"
\n",
"
Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/latex": [
"\\begin{center}\n",
"\\begin{tabular}{lclc}\n",
"\\toprule\n",
"\\textbf{Dep. Variable:} & eval & \\textbf{ R-squared: } & 0.110 \\\\\n",
"\\textbf{Model:} & OLS & \\textbf{ Adj. R-squared: } & 0.097 \\\\\n",
"\\textbf{No. Observations:} & 463 & \\textbf{ F-statistic: } & 8.055 \\\\\n",
"\\textbf{Covariance Type:} & nonrobust & \\textbf{ Prob (F-statistic):} & 3.03e-09 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"\\begin{tabular}{lcccccc}\n",
" & \\textbf{coef} & \\textbf{std err} & \\textbf{t} & \\textbf{P$> |$t$|$} & \\textbf{[0.025} & \\textbf{0.975]} \\\\\n",
"\\midrule\n",
"\\textbf{Intercept} & 3.8601 & 0.076 & 51.031 & 0.000 & 3.711 & 4.009 \\\\\n",
"\\textbf{gender[T.male]} & 0.5076 & 0.107 & 4.741 & 0.000 & 0.297 & 0.718 \\\\\n",
"\\textbf{tenure[T.yes]} & 0.0275 & 0.088 & 0.312 & 0.755 & -0.146 & 0.201 \\\\\n",
"\\textbf{gender[T.male]:tenure[T.yes]} & -0.3781 & 0.122 & -3.100 & 0.002 & -0.618 & -0.138 \\\\\n",
"\\textbf{beauty} & 0.0006 & 0.080 & 0.008 & 0.994 & -0.156 & 0.157 \\\\\n",
"\\textbf{gender[T.male]:beauty} & 0.1362 & 0.124 & 1.094 & 0.274 & -0.108 & 0.381 \\\\\n",
"\\textbf{beauty:tenure[T.yes]} & 0.1301 & 0.099 & 1.315 & 0.189 & -0.064 & 0.325 \\\\\n",
"\\textbf{gender[T.male]:beauty:tenure[T.yes]} & -0.0934 & 0.146 & -0.640 & 0.522 & -0.380 & 0.193 \\\\\n",
"\\bottomrule\n",
"\\end{tabular}\n",
"%\\caption{OLS Regression Results}\n",
"\\end{center}\n",
"\n",
"Notes: \\newline\n",
" [1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
],
"text/plain": [
"\n",
"\"\"\"\n",
" OLS Regression Results \n",
"==============================================================================\n",
"Dep. Variable: eval R-squared: 0.110\n",
"Model: OLS Adj. R-squared: 0.097\n",
"No. Observations: 463 F-statistic: 8.055\n",
"Covariance Type: nonrobust Prob (F-statistic): 3.03e-09\n",
"=======================================================================================================\n",
" coef std err t P>|t| [0.025 0.975]\n",
"-------------------------------------------------------------------------------------------------------\n",
"Intercept 3.8601 0.076 51.031 0.000 3.711 4.009\n",
"gender[T.male] 0.5076 0.107 4.741 0.000 0.297 0.718\n",
"tenure[T.yes] 0.0275 0.088 0.312 0.755 -0.146 0.201\n",
"gender[T.male]:tenure[T.yes] -0.3781 0.122 -3.100 0.002 -0.618 -0.138\n",
"beauty 0.0006 0.080 0.008 0.994 -0.156 0.157\n",
"gender[T.male]:beauty 0.1362 0.124 1.094 0.274 -0.108 0.381\n",
"beauty:tenure[T.yes] 0.1301 0.099 1.315 0.189 -0.064 0.325\n",
"gender[T.male]:beauty:tenure[T.yes] -0.0934 0.146 -0.640 0.522 -0.380 0.193\n",
"=======================================================================================================\n",
"\n",
"Notes:\n",
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
"\"\"\""
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Your answer here\n",
"# three variable interaction\n",
"three_interact = smf.ols('eval ~ gender * beauty * tenure', data=profs).fit()\n",
"three_interact.summary(slim=True)"
]
},
{
"cell_type": "markdown",
"id": "b07f0191-41fa-47d4-9c4b-788077a41835",
"metadata": {},
"source": [
"Confusion abounds looking at the coefficients. Lets make sense of this by asking for predictions from the model. Generate a data grid that asks for beauty to be evaluated at [-2, 0, 2] (that's -2 standard devs below average, average, and 2 above, as the variable is scaled so by the authors), for tenured and non-tenured males and females."
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "ba8978f1-5faf-43c9-a7e7-d8416d1c9481",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"
"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Your answer here\n",
"# Predictions\n",
"three_preds = me.predictions(three_interact, newdata=predmat)\n",
"\n",
"# Plot to show interaction pattern overall - lots of ways to do this, e.g.\n",
"sns.lineplot(data=three_preds,\n",
" x='beauty', \n",
" y='estimate',\n",
" style='gender',\n",
" hue='tenure')\n",
"\n",
"# Or\n",
"sns.relplot(data=three_preds,\n",
" x='beauty', y='estimate',\n",
" style='tenure', col='gender',\n",
" kind='line')\n"
]
},
{
"cell_type": "markdown",
"id": "9602f4ba-6845-414b-96f1-fb59f5e2a9e9",
"metadata": {},
"source": [
"If you have visualised it correctly, you should see the general pattern that male evaluations increase with beauty, but they are lower with tenure. Females on the other only show a positive beauty association *with* tenure, and no association without it.\n",
"\n",
"Now, are those differences meaningful? To test that we need to make a decision about how we want to evaluate our interaction. That depends on the question, since there are many ways to interpret interactions of this complexity.\n",
"\n",
"Do this in steps. First, is the association between beauty and evaluations different for females and males? "
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "855c8231-82f0-462e-b134-4cbf9e439ec3",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-input"
]
},
"outputs": [
{
"data": {
"text/html": [
"